Modeling Prosodic Structures in Linguistically Enriched Environments

نویسندگان

  • Gerasimos Xydas
  • Dimitris Spiliotopoulos
  • Georgios Kouroupetroglou
چکیده

A significant challenge in Text-to-Speech (TtS) synthesis is the formulation of the prosodic structures (phrase breaks, pitch accents, phrase accents and boundary tones) of utterances. The prediction of these elements robustly relies on the accuracy and the quality of error-prone linguistic procedures, such as the identification of the part-of-speech and the syntactic tree. Additional linguistic factors, such as rhetorical relations, improve the naturalness of the prosody, but are hard to extract from plain texts. In this work, we are proposing a method to generate enhanced prosodic events for TtS by utilizing accurate, error-free and high-level linguistic information. We are also presenting an appropriate XML annotation scheme to encode syntax, grammar, new or given information, phrase subject/object information, as well as rhetorical elements. These linguistically enriched has have been utilized to build realistic machine learning models for the prediction of the prosodic structures in terms of segmental information and ToBI marks. The methodology has been applied by exploiting a Natural Language Generator (NLG) system. The trained models have been built using classification via regression trees and the results strongly indicate the realistic effect on the generated prosody. The evaluation of this approach has been made by comparing the models produced by the enriched documents to those produced by plain text of the same domain. The results show an improved accuracy of up to 23%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling Improved Prosody Generation from High-Level Linguistically Annotated Corpora

Synthetic speech usually suffers from bad F0 contour surface. The prediction of the underlying pitch targets robustly relies on the quality of the predicted prosodic structures, i.e. the corresponding sequences of tones and breaks. In the present work, we have utilized a linguistically enriched annotated corpus to build data-driven models for predicting prosodic structures with increased accura...

متن کامل

Prosody Prediction from Linguistically Enriched Documents Based on a Machine Learning Approach

One of the main aspects in text-to-speech synthesis is the successful prediction of prosodic events. In this work we deal with the prediction of prosodic phrase breaks, accent tones and boundary tones from a linguistically XML-based enriched input (SOLE-ML) produced by a Natural Language Generator (NLG) system. We first extended the original specification of SOLE-ML in order for the NLG to prod...

متن کامل

Perceptually based automatic prosody labeling and prosodically enriched unit selection improve concatenative text-to-speech synthesis

Prosody is an important factor in the quality of text-tospeech (TTS) synthesis. Typically, acoustic parameters such as f0 and duration are the only variables related to prosody that are used to determine unit selection. Our study explored adding the explicit use of linguistically and perceptually motivated prosodic categories in unit selection-based TTS. One of our goals was to automate the pro...

متن کامل

The psi/phi architecture for prosodic parsing

In this paper an architecture and an implementation for a linguistically based prosodic analyser is presented. The implementation is designed to handle typical prosodic input in the form of parallel input channels, and processes each input channel independently in a data-directed, phonologically motivated configuration of partly parallel, partly cascaded feature modules and module clusters, eac...

متن کامل

Prosodically Enriched Text Annotation for High Quality Speech Synthesis

Linguistically enriched text generated from natural language modules contributes significantly on the quality of speech synthesis. For all cases where such modules are not available, such enriched input needs to be produced from plain text in order to maintain quality. This work reports on a framework of several combined language resources and procedures (word/sentence identification, syntactic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004